Neyman-Pearson classification under a strict constraint
نویسندگان
چکیده
Motivated by problems of anomaly detection, this paper implements the Neyman-Pearson paradigm to deal with asymmetric errors in binary classification with a convex loss. Given a finite collection of classifiers, we combine them and obtain a new classifier that satisfies simultaneously the two following properties with high probability: (i), its probability of type I error is below a pre-specified level and (ii), it has probability of type II error close to the minimum possible. The proposed classifier is obtained by minimizing an empirical objective subject to an empirical constraint. The novelty of the method is that the classifier output by this problem is shown to satisfy the original constraint on type I error. This strict enforcement of the constraint has interesting consequences on the control of the type II error and we develop new techniques to handle this situation. Finally, connections with chance constrained optimization are evident and are investigated. keywords: binary classification, Neyman-Pearson paradigm, anomaly detection, empirical constraint, empirical risk minimization, chance constrained optimization.
منابع مشابه
A plug-in approach to neyman-pearson classification
The Neyman-Pearson (NP) paradigm in binary classification treats type I and type II errors with different priorities. It seeks classifiers that minimize type II error, subject to a type I error constraint under a user specified level α. In this paper, plug-in classifiers are developed under the NP paradigm. Based on the fundamental Neyman-Pearson Lemma, we propose two related plug-in classifier...
متن کاملNeyman-Pearson Classification, Convexity and Stochastic Constraints
Motivated by problems of anomaly detection, this paper implements the Neyman-Pearson paradigm to deal with asymmetric errors in binary classification with a convex loss. Given a finite collection of classifiers, we combine them and obtain a new classifier that satisfies simultaneously the two following properties with high probability: (i) its probability of type I error is below a pre-specifie...
متن کاملNeyman-Pearson Classification under High-Dimensional Settings
Most existing binary classification methods target on the optimization of the overall classification risk and may fail to serve some real-world applications such as cancer diagnosis, where users are more concerned with the risk of misclassifying one specific class than the other. Neyman-Pearson (NP) paradigm was introduced in this context as a novel statistical framework for handling asymmetric...
متن کاملDetection and Classification of Heart Premature Contractions via α-Level Binary Neyman-Pearson Radius Test: A Comparative Study
The aim of this study is to introduce a new methodology for isolation of ectopic rhythms of ambulatory electrocardiogram (ECG) holter data via appropriate statistical analyses imposing reasonable computational burden. First, the events of the ECG signal are detected and delineated using a robust wavelet-based algorithm. Then, using Binary Neyman-Pearson Radius test, an appropriate classifie...
متن کاملA survey on Neyman-Pearson classification and suggestions for future research
In statistics and machine learning, classification studies how to automatically learn to make good qualitative predictions (i.e., assign class labels) based on past observations. Examples of classification problems include email spam filtering, fraud detection, market segmentation. Binary classification, in which the potential class label is binary, has arguably the most widely used machine lea...
متن کامل